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Abstract 

Background: At the beginning of the transcription process, the RNA polymerase (RNAP) core enzyme requires a 
o-factor to recognize the genomic location at which the process initiates. Although the crucial role of o-factors has 
long been appreciated and characterized for many individual promoters, we do not yet have a genome-scale 
assessment of their function. 

Results: Using multiple genome-scale measurements, we elucidated the network of o-factor and promoter interactions 
in Escherichia coli. The reconstructed network includes 4,724 o-factor-specific promoters corresponding to transcription 
units (TUs), representing an increase of more than 300% over what has been previously reported. The reconstructed 
network was used to investigate competition between alternative o-factors (the o 70 and o 38 regulons), confirming the 
competition model of o substitution and negative regulation by alternative o-factors. Comparison with o-factor binding in 
Klebsiella pneumoniae showed that transcriptional regulation of conserved genes in closely related species is unexpectedly 
divergent. 

Conclusions: The reconstructed network reveals the regulatory complexity of the promoter architecture in prokaryotic 
genomes, and opens a path to the direct determination of the systems biology of their transcriptional regulatory 
networks. 

Keywords: Escherichia coli, Sigma factor, Network reconstruction, Comparative analysis, Klebsiella pneumoniae, Omics data, 
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Background 

The RNA polymerase (RNAP) core enzyme (E) for bacterial 
transcription is a catalytic multi-subunit complex (a 2 pp'co), 
capable of transcribing portions of the DNA template into 
RNA transcripts. At the beginning of the transcribing 
process, the RNAP core enzyme requires a a-factor to 
recognize the genomic location at which the process initi- 
ates [1-3] (Figure la). Then cr-factor, a single dissociable 
subunit, binds to E, forming a holoenzyme (Ea*, x for each 
a-factor) and orchestrates initiation of promoter-specific 
transcription [1]. To date, one housekeeping a-factor a 70 
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(rpoD) and six alternative a-factors a 54 , a 38 , a 32 , a 28 , a 24 , 
and a 19 (rpoN, rpoS, rpoH.fiiA, rpoE, and feci, respectively) 
have been described in Escherichia coll Although the im- 
portance of a-factors and their role in the function of the 
RNAP and bacterial transcription are well known, we do 
not yet have a genome-wide understanding of the network 
of regulatory interactions that the a-factors comprise in any 
species. With systems biology and genome-scale science 
emerging and describing the phenotypic functions of bac- 
teria, it is now possible to comprehensively elucidate the 
structure of the a-factor network. Here, we present the re- 
sults from a systems approach that integrates multiple 
genome-scale measurements to reconstruct the regulatory 
network of a-factor-gene interactions in E. coll This recon- 
struction is provided here as a resource for the scientific 
community. 
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Figure 1 Molecular basis of transcription and a reconstruction of o-factor-transcription unit gene (o-TUG) network from multi-omic ex- 
perimental datasets. (a) Diagram shows bacterial transcription process by an RNA polymerase (RNAP) core enzyme and an associated o-factor. 

(b) Four-step process of multi-omic data integration to reconstruct the o-TUG network. First, we identified RNAP-binding regions (RNAP map) and 
o-factor binding regions (o map) from RpoB and o-factor chromatin immunoprecipitation and microarray (ChlP-chip) data (the missing o 24 bind- 
ing information was taken from a public database [6]), resulting in the genome-wide holoenzyme binding map (Eo map). The Eo map was then 
combined with experimental transcription start site (TSS) information (TSS map), resulting in he strand-specific promoter map (P-map), which was 
integrated with previously reported TU information [7], resulting in the o-network. With this o-network, we then performed further analysis, such 
as network reprogramming, motif analysis, promoter overlapping, and alternative TSS usage. Subfigure I: IOPR, intensively overlapped promoter 
region; OPR, overlapped promoter region; SPR, single promoter region; Orphan, orphan promoter region. Subfigure III and IV: green and brown 
circles represent o 70 and o 38 , yellow circles represent TUs, and red dots represent genes. Edges show regulatory interactions between elements. 

(c) Datasets used for o-TUG network reconstruction: ChlP-chip dataset with RNAP and six o-factors, and the TSS dataset. The TSS dataset for 
exponential phase was taken from a previous study [9].TSS subpanel: exp, exponential phase; stat, stationary phase; heat, heat shock; gin, 
alternative nitrogen source with glutamine. (d) Magnified examples of rpoD (left panel, genomic region ranging from 3,196 to 3,214 kbp), feci and 
fecRAB (right panel, genomic region ranging from 4,494 to 4,517 kbp). 



Results and discussion 

Determination of the genome-wide map of holoenzyme 
binding 

To capture the first step of the transcription cycle, which 
is the formation of the Ec^-promoter complex, we ob- 
tained genome-wide location profiles and integrated the 



identified RNAP and a-factor binding sites, leading to 
a reconstruction of a genome-scale Ea-binding region 
map (Figure lb). A genome- wide static map of the entire 
group of Eo^-binding sites (Eo* map) was determined by 
employing chromatin immunoprecipitation and micro- 
array (ChlP-chip) of rifampicin-treated cells (Figure lc), 
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revealing the active promoter regions in vivo across the E. 
coli genome [4,5] (see Methods). A total of 2,129 Ec^-bind- 
ing regions were identified, consisting of 727 (34.1%) for 
the leading strand, 755 (35.5%) for the lagging strand, and 
647 (30.4%) for both strands (that is, divergent promoter re- 
gions) (see Additional file 1: Figure SI). 

Although the construction of the Ea* map is informative, 
it is not sufficient to produce the a-specific Ea-binding 
map, in which the promoter-specific role of the a-factor is 
detailed [6]. We thus deployed ChlP-chip assays for the dir- 
ect identification of the locations of a-factor binding across 
the E. coli genome. We analyzed E. coli cells grown to mid- 
logarithmic phase or to stationary phase under multiple 
growth conditions (see Additional file 2: Table SI). Using 
data from biological duplicate or triplicate experiments for 
each a-factor ChlP-chip (36 experiments in total), we iden- 
tified 1,643 targets for a 70 , 903 targets for a 38 , 312 targets 
for a 32 , 180 targets for a 54 , 51 targets for a 28 , and 7 targets 
for a 19 (Figure lc; Figure 2a; see Additional file 3: Table S2; 
see Additional file 4: Table S3). We were not able to obtain 
dataset for a 24 , and the missing dataset was supplemented 
by incorporating 65 a 24 promoter regions from RegulonDB 
[6]. For validation, we compared the a-factor binding re- 
gions with the previously reported promoters regulated by 
each a-factor [6] (Figure Id; see Additional file 5: Table S5). 
Overall, we identified 86% of the previously reported bind- 
ing sites and 2,465 new a-factor binding regions, extending 
the current knowledge by over 300% (see Additional file 5: 
Table S5). 

By integrating the entire Ea* and a-factor binding re- 
gions, we obtained the genome-wide Ea-binding region 
map (Ea map) comprising 3,161 binding regions (see Add- 
itional file 6: Table S4). Next, each Ea-binding site was clas- 
sified into one of three categories depending on the 
number of a-factors recruited to that site: single Ea-binding 
promoter region (SPR), overlapped Ea-binding promoter 
region (OPR), and intensively overlapped Ea-binding pro- 
moter region (IOPR) (Figure lb, d; Figure 2b). For instance, 
all a-factors except a 19 were detected at the promoter 
region of the rpoD gene, which encodes a 70 ; however, only 
a 19 was found to bind to the promoter region of the 
fecABCDE operon, which encodes the ferric citrate outer 
membrane receptor and the ferric citrate ATP-binding 
cassette (ABC) transporter (Figure Id). Over 48% of 
Ea-binding regions identified in this study were over- 
lapped or extensively overlapped binding regions, indi- 
cating that Ea switching, or binding of alternative Ea, at 
the same promoter region may be needed to ensure 
continued gene expression in response to environmen- 
tal changes [2] (Figure 2a). 

Determination of the genome-wide promoter map 

We found that 69% of the Ea-binding regions exhibited 
strand specificity, with the balance being observed as 



divergent promoter regions (see Additional file 1: Figure 
SI). Although the assignment of the RNAP-binding re- 
gions to each strand was achievable using the expression 
profiles [7], it was difficult to assign a-factors directly 
to the promoter regions because information on the 
ds-acting sequence elements, such as the -10 and -35 
boxes in the promoter regions, is not yet fully elucidated 
for each a-factor. To identify the promoter elements 
more precisely with strand specificity and a better reso- 
lution than ChlP-chip, we performed transcription start 
site (TSS) profiling at the genome scale with a single nu- 
cleotide resolution. A genome-wide TSS map was gener- 
ated from TSS profiling by rapid amplification of cDNA 
ends (RACE) followed by deep sequencing after 5' 
triphosphate enrichment [8-10] for three conditions: 
stationary phase, heat shock, and alternative nitrogen 
source with glutamine. TSS profiling for exponential 
phase was taken from a previous study [9], and proc- 
essed together with the other three datasets. The TSS 
map was then integrated with the Ea map to build a 
strand-specific promoter map (P map) (Figure lb-d). 

Reconstruction of sigma factor regulons and their 
overlaps 

The P map was combined with the transcription unit 
(TU) map [7], resulting in the a-factor-TU gene (a- 
TUG) network (Figure 2d, e; see Additional file 7: Table 
S6). A network of interactions between the a-factors was 
extracted from the a-TUG network (Figure 2c). a 70 and 
a 24 are the only a-factors that auto-regulate themselves, 
and a 70 and a 38 re gulate most of the other a-factors, 
reflecting their roles as housekeeping a-factors in expo- 
nential and stationary phase [1]. Gene essentiality data 
are available for E. coli [11], and only rpoD has been 
found to be an essential a-factor. This network feature is 
consistent with the fact that a 70 regulates the highest 
number of a-factors, including itself. In addition, a 70 has 
the largest regulon, and this cannot be replaced by the 
other a-factors (Figure 2d). 

The significant overlap of a-factor regulons leads to 
the fundamental questions: what is the molecular basis 
for the overlap, and what are the consequences of having 
a complicated a-factor network? Because each a-factor 
has an individual ability to recognize ds-acting sequence 
elements in the promoter region (such as -10 box or 
-35 box), we analyzed the sequence motifs of the pro- 
moter regions (see Additional file 1: Figure S2). As in 
previous studies [12-14], the sequence motifs of a 70 and 
a 38 were found to have a similar -10 box sequence 
(TAtaaT and CTAtacT); however, unlike the a 70 se- 
quence motif, a 38 did not have a distinctive -35 box. 
The similarity in the -10 box sequence motifs of the 
a 70 - and a 38 -specific promoters and the degenerate na- 
ture of the -35 box sequence of the a 38 -specific 
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□ Single promoter region (1 binding) 
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(See figure on previous page.) 

Figure 2 Properties of the reconstructed a-factor network in Escherichia coli. (a) Extensive overlapping between o-factor binding sites. For each 
o-factor, o 70 , o 38 , o 54 , o 32 , o 28 , o 24 , and o 19 , we identified 1,643, 903, 180, 312, 65, 51, and 7 binding regions, respectively. The number of binding regions 
overlapping between any two o-factors is shown. For instance, 805 binding regions that were bound by both o 70 and a 38 were identified, (b) Number of 
promoters bound by multiple o-factors showed a complex overlap between different o-factors, indicating complicated alternative o-factor usage, (c) A 
regulatory network between o-factors in E coli, in which o 70 and o 38 regulate expression of most of the seven o-factors; o 70 and o 24 auto-regulate them- 
selves, (d) Reconstruction of a three-layered network of o-factors, transcription units (TUs), and genes. This network shows that many transcription start sites 
(JSSs) are shared by multiple o-factors, suggesting possible competition between o-factors for promoter binding, (e) Examples of thrLABC and hypBCDE-fhIA 
transcription units that are differently regulated by multiple o-factors, and result in different TUs containing different sets of genes. For instance, TU001 is 

regulated by o 70 and contains four genes, thrLABC, while TU0005 is regulated by o 38 and had only two genes, thrB and thrC 
\ J 



promoters explains, in part, how a large overlap between 
a 70 and a 38 regulons is possible. 

With the structure and molecular details of the a-TUG 
network in hand, we were able to study its functional states. 
Because of the limited number of E complexes in a growing 
R coli cell [1], each a-factor should compete to achieve as- 
sociation with an E complex to initiate transcription. Thus, 
it becomes important which factor Ea* binds, and how fre- 
quently it does so [15]. We found that the promoter sets 
specific to each a-factor overlap extensively, and a large 
number of promoters bound by multiple a-factor share the 
same TSS (Figure 2a,d). These findings raise questions 
about the molecular mechanism of a-factor competition 
for binding to the E complex and subsequently to the pro- 
moter, and how that affects transcription initiation. 

Sigma factor competition in overlapped promoters 

a-factors are believed to act predominantly as positive 
effectors, as they recognize the ds-acting elements in 
promoters that enable the Ea* to bind. Interestingly, 
however, a 38 has a negative effect on the expression level 
of some genes, even though it acts mainly as a positive 
effector [16,17]. To shed light on the molecular mecha- 
nisms of a-factor competition by a 38 , we performed ChlP- 
chip experiments for RpoB with wild type (WT) E. coli and 
its isogenic rpoS knock-out strain to obtain differential Ea* 
binding to the genome. The differential binding intensity of 
the Eo* to the promoters of 1,139 genes, whose transcrip- 
tion is directly affected by a 38 , is shown in Figure 3a. Ifo 38 - 
specific promoters were bound only by a 38 , then the E 
complex recruited to those promoters would be very scarce. 
However, the majority of a 38 -specific promoters showed 
significant levels of signaling for Ea* binding in the a 38 de- 
letion strain, indicating recruitment of the Eo* and implying 
rescue of transcription activity (Figure 3a). 

To confirm that the detected binding of the Ea* leads 
to transcription, we performed expression profiling with 
WT and rpoS knock-out strain cells under stationary 
phase conditions (Figure 3b; see Additional file 8: Table 
S7). Most genes having a 38 -specific promoters were 
expressed. Of 1,139 genes with a 38 -specific promoters, 
178 (16%) showed up-regulated expression when rpoS 
was removed and 291 (26%) showed expression that 
was down- regulated more than two-fold (£-test P-value 



<0.05). The remaining 58% of the genes showed no stat- 
istical significance in expression (fold change <2) or were 
not expressed in either strain. In the absence of rpoS, 
a 38 -specific promoters became active in transcription, 
leading to expression of the corresponding genes, but at 
a different level for 469 (41%) of these 1,139 genes. 

Expression of genes with a 38 -regulated genes was re- 
covered when rpoS was knocked out; however, it is not 
known which of the other a-factors is replacing the role 
of a 38 . As a 70 shared the largest portion of promoters 
with a 38 , it is reasonable to assume that a 70 would re- 
place a 38 when a 38 is missing. In E. coli MC4100, it was 
reported that the amount of a 70 is in abundance during 
stationary phase [18]. Similarly, we found that E. coli K-12 
MG1655 also showed high protein expression of a 70 during 
stationary phase in WT and ArpoS strain (Figure 3c, see 
Additional file 1 for detailed description). In addition, we 
examined how many genes bound by a 38 in the WT strain 
were bound by a 70 when rpoS was deleted. We found that 
about 89% of those genes was bound by a 70 when a 38 was 
missing, (see Additional file 1: Figure S3). This unexpect- 
edly high rate of a-factor substitution explains how the 
majority of genes directly bound by a 38 recovered their 
expression when rpoS was knocked out (Figure 3b). How- 
ever, it is still unclear how some of those genes were up- 
regulated. 

Because approximately 89% of these genes were bound 
by a 70 , we measured the intensity of a 70 binding in ArpoS 
during stationary phase with ChlP-chip experiments, and 
compared the binding intensity between up-regulated and 
down-regulated genes (Figure 3d; see Additional file 1: 
Figure S4). This measurement showed that up-regulated 
genes were bound more strongly by a 70 (P-value of Wil- 
coxon rank sum test was 4.80 x 10" 18 ), suggesting that 
strong a 70 binding resulted in increased transcription. This 
finding indicates that the presence of a 38 actually contrib- 
uted to repressing the transcriptional expression of some 
genes, presumably by competition for shared promoters 
between a 70 and a 38 . 

Comparative analysis of the sigma factor network in 
closely related species 

With the detailed reconstruction of the a-TUG network in 
E. coli, we could now address the issue of the difference 
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immunoprecipitation and microarray (ChlP-chip), intensity; the three red lines represent the first, second, and third quantiles. (b) 
Comparison of transcriptional expression of genes in wild type (WT) and ArpoS strains. Of 1 ,1 39 genes with o 38 -specific promoters,1 78 had 
up-regulated transcription (red background) and 291 had down-regulated transcription (blue background), (c) Expression level of o 70 and 
o 38 was measured at both th transcriptional and translational levels. The amount of o 70 was abundant in exponential and stationary phase, 
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between such networks in closely related species. Genome- 
wide identification of TSSs of two gamma- Proteobacteria, 
R coli and Klebsiella pneumoniae, revealed that promoter 
regions upstream of orthologous genes are differently orga- 
nized in the two species, resulting in different usage of TSSs 
[9]. As a- factors recognize sequence elements of promoters, 
and they are directly upstream of TSSs, it is important to 
determine any differences in a-factor binding patterns. 



Whereas the R. coli genome contains seven a-factors, K 
pneumoniae is known to have only five, missing fliA and 
feci, which are found in R. colL The five a-factors that the 
two species have in common are highly conserved in terms 
of amino acid sequence similarity: 95.9% for rpoD, 98.5% 
for rpoS, 89.8% for rpoN, 95.1% for rpoH, and 96.3% 
for rpoR. Promoter sequence motifs examined from the 
TSSs were found to be identical between R. coli and K. 
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pneumoniae, suggesting that the sequence motifs for each 
orthologous a-factor are identical [9,19]. However, the dif- 
ferent organization of upstream regulatory regions of the 
two species and the different pattern of transcription initi- 
ation indicates the possibility of significantly diverse a- 
factor binding. 

To investigate the binding patterns of two major a- 
factors, rpoD and rpoS, we analyzed ChlP-chip datasets for 
a 70 under exponential phase and a 38 under stationary phase 
grown in glucose minimal media as described previously 
[19]. E. coli and K. pneumoniae have 4,513 and 5,305 genes, 
respectively, and 2,876 coding genes were defined as ortho- 
logs by two-way reciprocal alignment. Binding of a 70 and 
a 38 under specified conditions upstream of those ortholo- 
gous genes was analyzed and clustered (Figure 4a). Of the 
2,876 orthologous genes, 60% showed the same binding 
patterns (584 had both a factors bound, 213 had a 70 bound, 
102 had a 38 bound, and 847 had neither factor bound). 
These two closely related bacteria, E. coli and K. pneumo- 
niae, share the majority of their gene contents, with most 
of the open reading frames having highly conserved se- 
quences. However, conserved genes showed significantly 
different cr-factor binding patterns, indicating diverse gene 
regulation by different transcription initiation (Figure 4c,d). 
Interestingly, in some cases, altered binding of a-factors 
was associated with changes in TU organization, suggesting 
even more diverse regulation between the two species. Al- 
though two major a-factors were found to bind differently 
upstream of orthologous genes, regulation between a- 
factors remained unchanged, except for the two missing 
a-factors, fliA and feci, in K. pneumoniae (see Additional 
file 1: Figure S5). Thus, regulation of gene expression by 
a-factors may evolve faster than regulation among the a- 
factors themselves. 

Conclusions 

Genome-scale measurements enabled us to reconstruct the 
a-TUG network in E. coli K-12 MG1655. This network is 
at the core of transcriptional regulation in bacteria. Its re- 
construction has enabled the assessment of its topological 
characteristics, functional states, and limited comparison 
with related species. With the integration of a growing body 
of experimental data on transcription factor (TF) binding 
and activity, the resource provided here opens up the possi- 
bility of developing a comprehensive reconstruction of the 
entire transcriptional regulatory network in E. coli, which 
would simultaneously describe the function of a-factors 
and TFs that produce the entire expression state of the 
organism. 

Methods 

Bacterial strains, media, and growth conditions 

E. coli K-12 MG1655 and its isogenic knock-out strains 
were used in this study. The deletion mutants (ArpoS 



and ArpoN) were generated by a X Red and FLP- 
mediated site-specific recombination system [20]. E. coli 
cells were harvested at mid-exponential phase (optical 
density at 600 nm (OD 600nm ) of approximately 0.5) with 
the exception of stationary phase experiments (OD 60 onm 
approximately 1.5). Glycerol stocks of E. coli strains were 
inoculated into M9 or W2 minimal media [21] (for 
nitrogen-limiting condition) with glucose (2 g/1) and cul- 
tured overnight at 37°C with constant agitation. Cultures 
were then diluted 1:100 into 50 ml of fresh minimal 
media, and cultured at 37°C to appropriate cell density. 
For heat-shock experiments, cells were grown to mid- 
exponential phase at 37°C. and half of the culture was 
used as a control, while the remaining culture was trans- 
ferred into pre-warmed (50°C) media and incubated for 
10 minutes. For nitrogen-limiting condition, ammonium 
chloride in the minimal media was replaced by glutam- 
ine (2 g/1). 

Total RNA isolation 

Cell (3 ml) culture was mixed with 6 ml RNAprotect 
Bacteria Reagent (Qiagen, Valencia, CA, USA). Samples 
were mixed immediately by vortexing for 5 seconds, in- 
cubated for 5 minutes at room temperature, then centri- 
fuged at 5000 x g for 10 minutes. The supernatant was 
decanted, and any residual supernatant was removed by 
inverting the tube once onto a paper towel. Total RNA 
samples were then isolated using an RNeasy Plus Mini 
Kit (Qiagen) in accordance with the manufacturer s in- 
structions. Samples were then quantified using a Nano- 
Drop 1000 spectrophotometer (Thermo Scientific), 
and the quality of the isolated RNA was checked by 
visualization on agarose gels and by measuring the ra- 
tio of the absorbance at 260 and 280 nm (A260/A280 ra- 
tio) of the sample (>1.8). 

Transcriptome analysis 

Transcriptome datasets with oligonucleotide tiling mi- 
croarrays for WT E. coli K-12 MG1655 grown under 
four conditions (exponential phase, stationary phase, 
heat shock, and nitrogen-limiting condition), were taken 
from a previous study [7]. In order to obtain a transcrip- 
tome dataset for E. coli deletion mutant ArpoS, a previ- 
ously described protocol [9] was adapted for the deletion 
mutant in the current study. Briefly, 10 \ig of purified 
total RNA sample was reverse transcribed to cDNA 
with amino-allyl dUTP. The amino-allyl-labeled cDNA 
samples were then coupled with Cy3 monoreactive 
dyes (Amersham). Cy3-labeled cDNAs were fragmen- 
ted to the 50 to 300 bp range with DNase I (Epicentre). 
High-density oligonucleotide tiling arrays consisting of 
371,034 50-mer probes spaced 25 bp apart across the 
whole E. coli genome were used (Roche Nimblegen). 
Hybridization, washing, and scanning were performed 
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Figure 4 Conservation and divergence in transcriptional regulation by o-factors. (a) Clustering o-factor binding patterns revealed 
conserved and divergent transcriptional regulation of 2,876 orthologous genes, (b) crp is regulated by o 70 and o 38 in both species, showing 
regulation conservation, (c) In Esherichio coli, cutA is a part of the dcuA-cutA-dipZ transcription unit (TU) and is regulated by o 70 and o 38 , 
while cutA in Klebsiella pneumoniae is the first gene in its TU, and is directly bound by o 70 . (d) In K. pneumoniae, panD is a part of the panBCD 
TU, which is regulated by o 70 . However, in E. coli, panD is separated from panBC by yadD, making another distinct TU. These two TUs are 
both regulated by o 70 . (e) A genomic region containing ydeA and marC in both species was inverted, and this genomic inversion was 
accompanied by a transcription regulation switch between o 70 and o 38 . 



in accordance with the manufacturers instructions. 
Three biological replicates were used for stationary 
phase in glucose minimal media. Probe level data were 
normalized with a robust multiarray analysis (RMA) 
algorithm without background correction, as imple- 
mented in NimbleScan 2.4 software. 



TSS-sequencing by modified 5' RACE, and deep 
sequencing 

The raw TSS dataset for exponential phase was taken 
from a previous study [9]. For the other three conditions 
(stationary phase, heat shock, and nitrogen-limiting con- 
dition), the previously described TSS determination 
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protocol [9] was adapted for E. coli K-12 MG1655. To 
enrich intact 5' tri-phosphorylated mRNAs from the total 
RNA, 5' mono-phosphorylated rRNA and any degraded 
mRNA were removed by treatment with a Terminator 5'- 
Phosphate Dependent Exonuclease (Epicentre) at 30°C for 
1 hour. The reaction mixture consisted of 10 ug purified 
total RNA, 1 ul terminator exonuclease, reaction buffer, 
and RNase-free water up to total 20 ul. The reaction was 
terminated by adding 1 ul of 100 mM EDTA (pH 8.0). In- 
tact tri-phosphorylated RNAs were precipitated by adding 
1/10 volume of 3 M sodium acetate (pH 5.2), 3 volumes of 
ethanol, and 2 ul of 20 mg/ml glycogen. RNA was precipi- 
tated at -80°C for 20 minutes and pelleted, washed with 
70% ethanol, dried in Speed- Vac for 7 minutes without 
heat, and resuspended in 20 ul nuclease free water. The 
tri-phosphorylated RNA was then treated with RNA 5'- 
Polyphosphatase (Epicentre) to generate 5 '-end mono- 
phosphorylated RNA for adaptor ligation. The RNA sample 
from the previous step was mixed with 2 ul lOx reaction 
buffer, 0.5 ul SUPERase-In (Ambion), 1 ul RNA 5 '-Poly- 
phosphatase, and RNase-free water up to 20 ul. The mix- 
ture was incubated at 37°C for 30 minutes and reaction was 
stopped by phenol-chloroform extraction. Ethanol precipi- 
tation was carried out for isolating the RNA as described 
above. To ligate the 5 ' small RNA adaptor (Table 1) to the 
5 '-end of the mono-phosphorylated RNA, the enriched 
RNA samples were incubated with 100 uM of the adaptor 
and 2.5 U of T4 RNA ligase (New England Biolabs). cDNAs 
were synthesized using the adaptor-ligated mRNAs as tem- 
plate using a modified small RNA RT primer from Illumina 
(Table 1) and Superscript II Reverse Transcriptase (Invitro- 
gen). The RNA was mixed with 25 uM modified small 
RNA RT primer and incubated at 70°C for 10 minutes and 
then at 25°C for 10 minutes. RT was carried out at 25°C for 
10 minutes, 37°C for 60 minutes, and 42°C for 60 minutes, 
followed by incubation at 70°C for 10 minutes. The RT re- 
action mixture consisted of 5x first 11 strand buffer; 0.01 M 
DTT, 10 mM dNTP mix, 30 U SUPERase*In (Ambion), 
and 1500 U Superscript II (Invitrogen). After the reaction, 
RNA was hydrolyzed by adding 20 ul of 1 N NaOH and in- 
cubating the mixture at 65°C for 30 minutes. The reaction 
mixture was neutralized by adding 20 ul of 1 N HC1. The 
cDNA samples were amplified using a mixture of 1 ul 



Table 1 Primers used in the study 



Primer 


Direction 


Sequence 5'— >3' 


Small RNA adaptor 




GUUCAGAGUUCUACAGUCCGA 
CGAUC 


Small RT primer 




CAAGCAGAAGACGGCA 
TACGANNNNNNNNN 


Amplification primers 


Forward 


AATGATACGGCGACCACCGACA 
GGTOAGAGTOTACAGTCCGA 




Reverse 


CAAGCAGAAGACGGCATACGA 



RT, reverse transcription. 



cDNA, 10 ul Phusion HF buffer (NEB), 1 ul dNTPs 
(10 mM), 1 ul SYBR Green (Qiagen), 0.5 ul HotStart Phu- 
sion (NEB), and 5 pM small RNA PCR primer mix. The 
amplification primers used are shown in Table 1. The PCR 
mixture was denatured at 98°C for 30 seconds and cycled 
to 98°C for 10 seconds, 57°C for 20 seconds, and 72°C for 
20 seconds. Amplification was monitored by a LightCycler 
(Bio-Rad) and stopped at the beginning of the saturation 
point. Amplified DNA was run on a 6% Tris-borate-EDTA 
(TBE) gel (Invitrogen) by electrophoresis, and DNA frag- 
ments ranging from 100 to 300 bp were size-fractionated. 
Gel slices were dissolved in two volumes of EB buffer 
(Qiagen) and 1/10 volume of 3 M sodium acetate (pH 5.2). 
The amplified DNA was then ethanol-precipitated and 
resuspended in 15 ul DNase-free water (USB). The final 
samples were then quantified using a NanoDrop 1000 spec- 
trophotometer (Thermo Scientific). 

Sequencing, data processing, and mapping 

The data processing and mapping of the sequencing re- 
sults to obtain potential TSSs was performed exactly as 
described previously [9]. In brief, the amplified cDNA li- 
braries from two biological replicates for each condition 
were sequenced on an Illumina Genome Analyzer. Se- 
quence reads for cDNA libraries were aligned to the E. 
coli K-12 MG1655 genome (NC_000913) using Mosaik 
[22] with the following arguments: hash size = 10, mis- 
matach = 0, and alignment candidate threshold = 30 bp. 
Only reads that aligned to a unique genomic location 
were retained. Two biological replicates were processed 
separately, and only sequence reads presented in both 
biological replicates were considered for further process- 
ing. The genome coordinates of the 5 '-end of these 
uniquely aligned reads were defined as potential TSSs, 
and of these, only TSSs with the strongest signal within 
10 bp window were kept, in order to remove possible 
noise signals. TSSs with signals that were 40% or greater 
of the strongest signal upstream of an annotated gene 
were considered as multiple TSSs. The strongest signal 
was defined as the potential TSS that had the highest 
number of reads out of all the TSSs upstream of an anno- 
tated gene. For further analysis, TSSs lying within RNAP- 
binding regions (see Additional file 4: Table S3) were used 
for integration with a-factor binding information. 

Chromatin immunoprecipitation and microarray analysis 

Briefly, the immunoprecipitated RNAP-associated DNA 
fragments were fluorescently labeled and hybridized to a 
high-density oligonucleotide tiling microarray represent- 
ing the entire E. coli genome [5]. To identify in vivo 
binding regions of RNAP complex and six a-factors (a 70 , 
a 54 , a 38 , a 32 , a 28 , and a 19 ), we isolated DNA fragments 
bound to those RNAP subunits from formaldehyde- 
crosslinked E. coli cells, using ChIP with six different 
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antibodies that specifically recognize each subunit 
(NeoClone). An E. coli strain harboring RpoH-8myc 
was constructed as previously described [23,24], and 
used for the a 38 ChlP-chip with anti-c-myc antibody 
(9E10; Santa Cruz Biotechnologies). Cells were grown 
under appropriate conditions (see Additional file 2: Table 
SI) and harvested. The immunoprecipitation (IP) DNA 
and mock-IP DNA were hybridized onto high-resolution 
whole-genome tiling microarrays, which contained a total 
of 371,034 oligonucleotides with 50-bp probes overlapping 
by 25 bp on both forward and reverse strands. Tiling 
microarrays were hybridized, washed, and scanned in ac- 
cordance with the manufacturers instructions (Roche 
NimbleGen). To increase the depth of the number of pro- 
moter regions identified, datasets were generated under 
multiple growth conditions with a total number of 45 ChlP- 
chip experiments (36 for a-factors and 9 for RNAP), and an- 
alyzed (see Additional file 2: Table SI). We were not able to 
obtain results for the ChlP-chip experiment for a 24 . This 
could be because the expression level of a 24 was not high 
enough, or the conditions were not appropriate to activate 
a 24 . To remedy the missing dataset, we deployed known 
binding information for a 24 from the public database [25]. 

ChlP-chip data analysis 

The analysis was performed, as previously described 
[7,26]. In brief, TF-binding regions were identified by 
using the peak-finding algorithm built into the NimbleScan 
software (Roche NimbleGen). Processing of ChlP-chip data 
was performed in three steps: normalization, IP/mock-IP 
ratio computation (in log 2 scale) and enriched-region iden- 
tification. The log 2 ratios of each spot in the microarray 
were calculated from the raw signals obtained from both 
Cy5 and Cy3 channels, and then the values were scaled by 
Tukey bi-weight mean. The log 2 ratio of Cy5 (IP DNA) to 
Cy3 (mock-IP DNA) for each point was calculated from 
the signals, then, the bi-weight mean of this log 2 ratio was 
subtracted from each point. Each log-ratio dataset (from 
duplicate or triplicate samples) was used to identify TF- 
binding regions using the software (width of sliding win- 
dow = 300 bp). Our approach to identify the TF-binding re- 
gions was to first determine the binding locations from 
each dataset, and then combine the binding locations from 
at least five of six datasets to define a binding region, using 
the recently developed MetaScope visualization software 
and genome browser [27]. 

Western blotting 

E. coli K-12 MG1655 and ArpoS deletion mutant cells 
were grown in M9 minimal media with 0.2% glucose, 
and were harvested from mid-exponential phase to sta- 
tionary phase every 2 hours. Cells were pelleted by cen- 
trifugation, and were lysed with lysozyme in a lysis 
buffer (10 mM Tris-HCl (pH 7.5), 100 mM NaCl, and 



1 mM EDTA. The supernatant was decanted after cen- 
trifugation to remove unlysed cells. The concentration 
of total protein in the lysate was measured with Qubit 
Protein Assay Kit (invitrogen), and 5 \ig of total protein 
sample were mixed with 4x SDS-PAGE sample loading 
buffer (Invitrogen) and 10 mM DTT, then boiled at 90°C 
for 5 minutes. The boiled samples were separated by 
electrophoresis with 10% Bis-Tris gel in MOPS buffer, and 
transferred onto Hybond-ECL membrane (Amersham Bio- 
sciences). The membrane was briefly washed in TBS with 
0.1% Tween-20 (lx TBS-T) for 5 minutes on a rocker, and 
then treated with 2% skim milk in TBS-T buffer for 1 hour 
with gentle shaking. The membrane was washed twice with 
TBS-T for 5 minutes each on a rocker, and then it was 
sliced into three pieces, with RpoB, a 70 , and a 38 in each 
slice. Sliced membranes were treated with anti-RpoB, anti- 
a 70 , and anti-a 38 antibodies (1:10,000 dilution; NeoClone) 
for 1 hour on a rocker. The membrane slices were washed 
once in TBS-T for 15 minutes, followed by three washes of 
5 minutes each, and then treated with HRP-conjugated 
anti-mouse IgG (1:10,000 dilution; Amersham Bioscience) 
in dilution for 30 minutes on a rocker, followed by one 
wash in TBS-T for 15 minutes and three washes of 5 mi- 
nutes each. Chemiluminescent detection was applied to 
peroxidase conjugates on membrane to detect the amount 
of RpoB, a 70 , and a 38 

Availability of supporting data 

All raw and processed data files have been deposited to 
Gene Expression Omnibus (accession number GSE46740). 
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(RNAP) binding. Figure S2. Sequence motifs of o-factors. Figure S3. The 
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missing. Figure S4. Examples of up-regulated and down-regulated genes 
when rpoS was knocked out. Figure S5. Comparison of transcriptional 
regulation by two major o-factors, o 70 and o 38 , in two closely related 
bacteria. Figure S6. Comparison of transcriptional level of o-factors and 
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binding regions in Escherichia coli. 

Additional file 4: Table S3. Binding intensities of RNA polymerase 
(RNAP) and o-factor binding regions. 
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different conditions. 
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